The microbial world holds vast potential for advancements in diverse fields such as food production, human health, and ecological sustainability. PanKB is a pangenomic knowledgebase working to empower practitioners to leverage microbial functions beyond those of a select few model organisms. PanKB provides:
PanKB includes multiple interactive analytics and tables that give an overview of a pangenome's contents. These analytics can be found or navigated to an individual pangenome's page.
Using Lactiplantibacillus plantarum as an example:
Pangenomes also serve as the foundation for further large-scale analyses, and PanKB is actively integrating their novel results. Recent pangenomic-scale analyses of variants, named Alleleomics, demonstrated unique value in narrowing the solution search space for feasible genetic variants in E.coli. PanKB currently includes the alleleomes of all of its pangenomes. Alleleome analytics can be found on pangenome and specific gene pages.
Using Lactiplantibacillus plantarum as an example:
PanKB implements multiple different methods for accessing its data. Users can access all of PanKB's data through its navigation links. Users can also quickly find specific data through the global database search feature accessible on most pages. Finally, users can also download the database's raw data through the various analytics hosted on database pages.
Combined, PanKB's features enable valuable workflows for enzyme and strain engineering. These include identifying genes for new enzyme production or reintroduction into strains, pinpointing precise gene edits to modify activity, discovering and optimizing valuable pathways, and selecting optimal starting strains. Current strain engineering heavily relies on models or familiar strains; the features and data of PanKB empower strain engineers to start leveraging pangenomic data for targeted bioengineering.
Scientific progress often necessitates extensive literature review, a traditionally time-consuming process. Large Language Models (LLMs) offer a potential solution by aggregating and summarizing knowledge across documents. PanKB includes an LLM chatbot (AI Assistant) focused on an open-access pangenomic bibliome, designed to accurately answer deep questions on pangenomics, cite relevant articles, and not attempt to hallucinate inaccurate content. This feature is an initial experiment towards combining an LLM and a specialized scientific database to accelerate scientific knowledge acquisition through automated knowledge extraction.
Pangenome Analysis: BGCFlow
Interactive Visualization: D3.js, Plotly.js, Highchart.js, hotmap.js, MSAViewer
Front-end Web Frameworks: Bootstrap, jQuery
Back-end Web Framework: Django
Database: Azure Cosmos DB for MongoDB
If you have questions or find any bugs in the database, please contact Patrick Phaneuf.
This work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF Grant Number NNF20CC0035580).